Image compression and decoding, video compression and decoding: training methods and training systems

ABSTRACT

A computer-implemented method of training an image generative network fθ for a set of training images, in which an output image {circumflex over (x)} is generated from an input image x of the set of training images non-losslessly, and in which a proxy network is trained for a gradient intractable perceptual metric that evaluates a quality of an output image {circumflex over (x)} given an input image x, the method of training using a plurality of scales for input images from the set of training images. In an embodiment, a blindspot network bα is trained which generates an output image {tilde over (x)} from an input image x. Related computer systems, computer program products and computer-implemented methods of training are disclosed.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a bypass continuation of International ApplicationNo. PCT/GB2021/051858, filed on Jul. 20, 2021, which claims priority toGB Application No. GB2011176.1, filed on Jul. 20, 2020; U.S. ApplicationNo. 63/053,807, filed on Jul. 20, 2020; GB Application No. GB2012461.6,filed on Aug. 11, 2020; GB Application No. 2012462.4, filed on Aug. 11,2020; GB Application No. 2012163.2, filed on Aug. 11, 2020; GBApplication No. 2012465.7, filed on Aug. 11, 2020; GB Application No.GB2012467.3, filed on Aug. 11, 2020; GB Application No. 2012468.1, filedon Aug. 11, 2020; GB Application No. GB2012469.9, filed on Aug. 11,2020; GB Application No. GB2016824.1, filed on Oct. 23, 2020; GBApplication No. GB2019531.9, filed on Dec. 10, 2020; and InternationalApplication No. PCT/GB2021/051041, filed on Apr. 29, 2021, the entirecontents of which being fully incorporated herein by reference.

BACKGROUND OF THE INVENTION 1. Field of the Invention

The field of the invention relates to computer-implemented trainingmethods and training systems for an image generative network, e.g. onefor image compression and decoding, and to related computer-implementedmethods and systems for image generation, e.g. image compression anddecoding, and to related computer-implemented methods and systems forvideo generation, e.g. video compression and decoding.

2. Technical Background

There is increasing demand from users of communications networks forimages and video content. Demand is increasing not just for the numberof images viewed, and for the playing time of video; demand is alsoincreasing for higher resolution, lower distortion content, if it can beprovided.

When images are compressed at a source device in a lossy way, this canlead to distortions or artifacts when the images are decompressed at arecipient device. It is desirable to train image encoders and decodersso that distortions or artifacts are minimized when the images aredecompressed.

3. Discussion of Related Art

U.S. Ser. No. 10/373,300B1 discloses a system and method for lossy imageand video compression and transmission that utilizes a neural network asa function to map a known noise image to a desired or target image,allowing the transfer only of hyperparameters of the function instead ofa compressed version of the image itself. This allows the recreation ofa high-quality approximation of the desired image by any systemreceiving the hyperparameters, provided that the receiving systempossesses the same noise image and a similar neural network. The amountof data required to transfer an image of a given quality is dramaticallyreduced versus existing image compression technology. Being that videois simply a series of images, the application of this image compressionsystem and method allows the transfer of video content at rates greaterthan previous technologies in relation to the same image quality.

U.S. Ser. No. 10/489,936B1 discloses a system and method for lossy imageand video compression that utilizes a metanetwork to generate a set ofhyperparameters necessary for an image encoding network to reconstructthe desired image from a given noise image.

Application PCT/GB2021/051041, which is incorporated by reference,discloses methods and systems for image compression and decoding, andfor video compression and decoding.

SUMMARY OF THE INVENTION

According to a first aspect of the invention, there is provided acomputer-implemented method of training an image generative networkf_(θ) for a set of training images, in which an output image {circumflexover (x)} is generated from an input image x of the set of trainingimages non-losslessly, and in which a proxy network is trained for agradient intractable perceptual metric that evaluates a quality of anoutput image {circumflex over (x)} given an input image x, the method oftraining using a plurality of scales for input images from the set oftraining images, the method including the steps of:

(i) receiving an input image x of the set of training images andgenerating one or more images which are derived from x to make amultiscale set of images {x_(i)} which includes x;

(ii) the image generative network f_(θ) generating an output image{circumflex over (x)}_(i) is from an input image x_(i)ϵ{x_(i)}, withouttracking gradients for f_(θ);

(iii) the proxy network outputting an approximated function outputŷ_(i), using the x_(i) and the {circumflex over (x)}_(i) as inputs;

(iv) the gradient intractable perceptual metric outputting a functionoutput y_(i), using the x_(i) and the {circumflex over (x)}_(i) asinputs;

(v) evaluating a loss for the proxy network, using the y_(i) and theŷ_(i) as inputs, and including the evaluated loss for the proxy networkin a loss array for the proxy network;

(vi) repeating steps (ii) to (v) for all the images x_(i) in themultiscale set of images {x_(i)};

(vii) using backpropagation to compute gradients of parameters of theproxy network with respect to an aggregation of the loss array assembledin executions of step (v);

(viii) optimizing the parameters of the proxy network based on theresults of step (vii), to provide an optimized proxy network;

(ix) the image generative network f_(θ) generating an output image{circumflex over (x)}_(i) from an input image x_(i)ϵ{x_(i)};

(x) the optimized proxy network outputting an optimized approximatedfunction output ŷ_(i), using the x_(i) and the {circumflex over (x)}_(i)as inputs;

(xi) evaluating a loss for the generative network f_(θ), using thex_(i), the {circumflex over (x)}_(i) and the optimized approximatedfunction output ŷ_(i) as inputs, and including the evaluated loss forthe generative network f_(θ) in a loss array for the generative networkf_(θ);

(xii) repeating steps (ix) to (xi) for all the images x_(i) in themultiscale set of images {x_(i)};

(xiii) using backpropagation to compute gradients of parameters of thegenerative network f_(θ) with respect to an aggregation of the lossarray assembled in executions of step (xi):

(xiv) optimizing the parameters of the generative network f_(θ) based onthe results of step (xiii), to provide an optimized generative networkf_(θ), and

(xv) repeating steps (i) to (xiv) for each member of the set of trainingimages.

An advantage is that the multiscale set of images provides improvedstability during the training. An advantage is that the proxy network ismore robust against adversarial samples. An advantage is that the proxynetwork is more robust against artifact generation. An advantage is thatwithin the field of learned image and video compression the methodallows networks to train with non-differentiable perceptual metrics.

The method may be one wherein steps (ii) to (xv) are repeated for theset of training images, to train the generative network f_(θ) and totrain the proxy network. An advantage is improved training of the imagegenerative network. An advantage is improved training of the proxynetwork.

The method may be one including the step (xvi) of storing the parametersof the trained generative network f_(θ) and the parameters of thetrained proxy network.

The method may be one wherein the one or more images which are derivedfrom x to make a multiscale set of images {x_(i)} are derived bydownsampling. An advantage is improved stability during the training. Anadvantage is that the proxy network is more robust against adversarialsamples. An advantage is that the proxy network is more robust againstartifact generation.

The method may be one wherein the image generative network f_(θ) is aneural network.

The method may be one wherein the rate of change of every parameter inthe network is computed with respect to its associated loss, and theparameters are updated in such a way that as to either minimise ormaximise the associated loss.

The method may be one wherein the proxy network is a neural network.

The method may be one wherein the proxy network is robust againstadversarial examples.

The method may be one wherein the gradients of the proxy network aretreated as a noisy differentiable relaxation of the intractablegradients of the gradient intractable perceptual metric.

The method may be one wherein when the pre-trained proxy network isfrozen, the generative network f_(θ) will learn to produce examplesoutside of the learnt boundary of the proxy network.

The method may be one wherein a training of ĥ_(ϕ) involves samples off_(θ)(x) and x, but does not require gradients for {circumflex over(x)}.

The method may be one wherein the generative network f_(θ) includes anencoder, which encodes (by performing lossy encoding) an input image xinto a bitstream, and includes a decoder, which decodes the bitstreaminto an output image {circumflex over (x)}.

The method may be one wherein the method includes an iteration of atraining pass of the generative network, and a training pass of theproxy network. An advantage is improved stability during the training.

The method may be one wherein the generative and proxy networks haveseparate optimizers. An advantage is improved stability during thetraining.

The method may be one wherein for the case of proxy networkoptimization, gradients do not flow through the generative network.

The method may be one wherein the method is used for learned imagecompression.

The method may be one wherein the number of input and output parametersto the gradient intractable perceptual metric is arbitrary.

The method may be one wherein the gradient intractable perceptual metricis a perceptual loss function.

The method may be one wherein the gradient intractable perceptual metricis VMAF, VIF, DLM or IFC, or a mutual information based estimator.

The method may be one wherein the generative network includes acompression network, wherein a term is added to the total loss of thecompression network to stabilise the initial training of the compressionnetwork. An advantage is improved stability during the initial training.

The method may be one wherein the generative loss includes a genericdistortion loss which includes one or more stabilisation terms. Anadvantage is improved stability during the training.

The method may be one wherein the stabilisation terms include MeanSquared Error (MSE) or a combination of analytical losses with weighteddeep-embeddings of a pre-trained neural network. An advantage isimproved stability during the training.

The method may be one wherein a receptive field covers a larger portionof an image for a downsampled input image. An advantage is the method ismore robust against adversarial samples. An advantage is the method ismore robust against artifact generation.

The method may be one wherein a perceptual quality score is assigned tothe image at each scale and is aggregated by an aggregation function. Anadvantage is improved stability during the training. An advantage isthat the proxy network is more robust against adversarial samples. Anadvantage is that the proxy network is more robust against artifactgeneration.

The method may be one wherein a user is able to select a number ofscales to use in the multiscale set of images. An advantage is thethoroughness of the training is user selectable.

The method may be one wherein the set of images includes a downsampledimage that has been downsampled by a factor of two in each dimension.

The method may be one wherein the set of images includes a downsampledimage that has been downsampled by a factor of four in each dimension.

The method may be one wherein the mean of the ŷ_(i) is used to train theimage generative network by attempting to maximise or minimise the meanof the ŷ_(i) using stochastic gradient descent. An advantage is improvedstability during the training. An advantage is that the proxy network ismore robust against adversarial samples. An advantage is that the proxynetwork is more robust against artifact generation.

The method may be one wherein the predictions y_(i) are used to trainthe proxy network to force its predictions to be closer to an output ofthe perceptual metric, using stochastic gradient descent. An advantageis improved stability during the training. An advantage is that theproxy network is more robust against adversarial samples. An advantageis that the proxy network is more robust against artifact generation.

The method may be one wherein for each image x, an RGB image isprovided.

According to a second aspect of the invention, there is provided acomputer system configured to train an image generative network f_(θ)for a set of training images, in which the system generates an outputimage {circumflex over (x)} from an input image x of the set of trainingimages non-losslessly, and in which a proxy network is trained for agradient intractable perceptual metric that evaluates a quality of anoutput image {circumflex over (x)} given an input image x, wherein thecomputer system is configured to:

(i) receive an input image x from the set of training images andgenerate one or more images which are derived from x to make amultiscale set of images {x_(i)} which includes x;

(ii) use the image generative network f_(θ) to generate an output image{circumflex over (x)}_(i) from an input image x_(i)ϵ{x_(i)}, withouttracking gradients for f_(θ);

(iii) use the proxy network to output an approximated function outputŷ_(i), using the x_(i) and the {circumflex over (x)}_(i) is as inputs;

(iv) use the gradient intractable perceptual metric to output a functionoutput y_(i), using the x_(i) and the {circumflex over (x)}_(i) asinputs;

(v) evaluate a loss for the proxy network, using the y_(i) and the ŷ_(i)as inputs, and to include the evaluated loss for the proxy network in aloss array for the proxy network;

(vi) repeat (ii) to (v) for all the images x_(i) in the multiscale setof images {x_(i)};

(vii) use backpropagation to compute gradients of parameters of theproxy network with respect to an aggregation of the loss array assembledin executions of (v);

(viii) optimize the parameters of the proxy network based on the resultsof (vii), to provide an optimized proxy network;

(ix) use the image generative network f_(θ) to generate an output image{circumflex over (x)}_(i) from an input image x_(i)ϵ{x_(i)};

(x) use the optimized proxy network to output an optimized approximatedfunction output ŷ_(i), using the x_(i) and the {circumflex over (x)}_(i)as inputs;

(xi) evaluate a loss for the generative network f_(θ), using the x_(i),the {circumflex over (x)}_(i) and the optimized approximated functionoutput ŷ_(i) as inputs, and to include the evaluated loss for thegenerative network f_(θ) in a loss array for the generative networkf_(θ);

(xii) repeat (ix) to (xi) for all the images x_(i) in the multiscale setof images {x_(i)};

(xiii) use backpropagation to compute gradients of parameters of thegenerative network f_(θ) with respect to an aggregation of the lossarray assembled in executions of (xi);

(xiv) optimize the parameters of the generative network f_(θ) based onthe results of (xiii), to provide an optimized generative network f_(θ),and

(xv) repeat (i) to (xiv) for each member of the set of training images.

An advantage is that the multiscale set of images provides improvedstability during training by the computer system. An advantage is thatthe proxy network is more robust against adversarial samples. Anadvantage is that the proxy network is more robust against artifactgeneration.

The computer system may be one wherein (ii) to (xv) are repeated for theset of training images, to train the generative network f_(θ) and totrain the proxy network.

The computer system may be configured to perform a method of any aspectof the first aspect of the invention.

According to a third aspect of the invention, there is provided acomputer program product executable on a processor to train an imagegenerative network f_(θ) for a set of training images, in which anoutput image {circumflex over (x)} is generated from an input image x ofthe set of training images non-losslessly, and a proxy network istrained for a gradient intractable perceptual metric that evaluates aquality of an output image {circumflex over (x)} given an input image x,the computer program product executable to:

(i) receive an input image x of the set of training images and generateone or more images which are derived from x to make a multiscale set ofimages {x_(i)} which includes x;

(ii) use the image generative network f_(θ) to generate an output image{circumflex over (x)}_(i) from an input image x_(i)ϵ{x_(i)}, withouttracking gradients for f_(θ):

(iii) use the proxy network to output an approximated function outputŷ_(i), using the x_(i) and the {circumflex over (x)}_(i) as inputs;

(iv) use the gradient intractable perceptual metric to output a functionoutput y_(i), using the x_(i) and the {circumflex over (x)}_(i) asinputs;

(v) evaluate a loss for the proxy network, using the y_(i) and the ŷ_(i)as inputs, and to include the evaluated loss for the proxy network in aloss array for the proxy network;

(vi) repeat (ii) to (v) for all the images x_(i) in the multiscale setof images {x_(i)};

(vii) use backpropagation to compute gradients of parameters of theproxy network with respect to an aggregation of the loss array assembledin executions of (v);

(viii) optimize the parameters of the proxy network based on the resultsof (vii), to provide an optimized proxy network;

(ix) use the image generative network f_(θ) to generate an output image{circumflex over (x)}_(i) from an input image x_(i)ϵ{x_(i)};

(x) use the optimized proxy network to output an optimized approximatedfunction output ŷ_(i), using the x_(i) and the {circumflex over (x)}_(i)as inputs;

(xi) evaluate a loss for the generative network f_(θ), using the x_(i),the {circumflex over (x)}_(i) and the optimized approximated functionoutput ŷ_(i) as inputs, and to include the evaluated loss for thegenerative network f_(θ) in a loss array for the generative networkf_(θ);

(xii) repeat (ix) to (xi) for all the images x_(i) in the multiscale setof images {x_(i)};

(xiii) use backpropagation to compute gradients of parameters of thegenerative network f_(θ) with respect to an aggregation of the lossarray assembled in executions of (xi);

(xiv) optimize the parameters of the generative network f_(θ) based onthe results of (xiii), to provide an optimized generative network f_(θ),and

(xv) repeat (i) to (xiv) for each member of the set of training images.

An advantage is that the multiscale set of images provides improvedstability during the training. An advantage is that the proxy network ismore robust against adversarial samples. An advantage is that the proxynetwork is more robust against artifact generation.

The computer program product may be one wherein (ii) to (xv) arerepeated for the set of training images, to train the generative networkf_(θ) and to train the proxy network.

The computer program product may be one executable on the processor toperform a method of any aspect of the first aspect of the invention.

According to a fourth aspect of the invention, there is provided acomputer-implemented method of training an image generative networkf_(θ) for a set of training images, in which an output image {circumflexover (x)} is generated from an input image x of the set of trainingimages non-losslessly, and in which a proxy network is trained for agradient intractable perceptual metric that evaluates a quality of anoutput image {circumflex over (x)} given an input image x, the method oftraining using a plurality of scales for input images from the set oftraining images.

The method may be one including a method of any aspect of the firstaspect of the invention.

An advantage is that the multiscale set of images provides improvedstability during the training. An advantage is that the proxy network ismore robust against adversarial samples. An advantage is that the proxynetwork is more robust against artifact generation.

According to a fifth aspect of the invention, there is provided a systemincluding a first computer system and a second computer system, thefirst computer system including a lossy encoder including a firsttrained neural network, the second computer system including a decoderincluding a second trained neural network, wherein the second computersystem is in communication with the first computer system, the lossyencoder configured to produce a bitstream from an input image; the firstcomputer system configured to transmit the bitstream to the secondcomputer system, wherein the decoder is configured to decode thebitstream to produce an output image; wherein the first computer systemin communication with the second computer system comprises a generativenetwork, wherein the generative network is trained using a methodaccording to any aspect of the first aspect of the invention.

An advantage is that the generative network is more robust againstartifact generation. An advantage is that the generative network is morerobust against adversarial samples.

The system may be one in which the system is for image or videocompression, transmission and decoding, wherein

(i) the first computer system is configured to receive an input image;

(ii) the first computer system is configured to encode the input imageusing the first trained neural network, to produce a latentrepresentation;

(iii) the first computer system is configured to quantize the latentrepresentation to produce a quantized latent;

(iv) the first computer system is configured to entropy encode thequantized latent into a bitstream;

(v) the first computer system is configured to transmit the bitstream tothe second computer system;

(vi) the second computer system is configured to entropy decode thebitstream to produce the quantized latent;

(vii) the second computer system is configured to use the second trainedneural network to produce an output image from the quantized latent,wherein the output image is an approximation of the input image.Quantizing, entropy encoding and entropy decoding details are providedin PCT/GB2021/051041.

An advantage is that for a fixed file size (“rate”), a reduced outputimage distortion is obtained.

The system may be one wherein the first computer system is a server,e.g. a dedicated server, e.g. a machine in the cloud with dedicated GPUse.g. Amazon Web Services, Microsoft Azure, etc, or any other cloudcomputing services.

The system may be one wherein the first computer system is a userdevice.

The system may be one wherein the user device is a laptop computer,desktop computer, a tablet computer or a smart phone.

The system may be one wherein the first trained neural network includesa library installed on the first computer system.

The system may be one wherein the first trained neural network isparametrized by one or several convolution matrices Θ, or the firsttrained neural network is parametrized by a set of bias parameters,non-linearity parameters, convolution kernel/matrix parameters.

The system may be one wherein the second computer system is a recipientdevice.

The system may be one wherein the recipient device is a laptop computer,desktop computer, a tablet computer, a smart TV or a smart phone.

The system may be one wherein the second trained neural network includesa library installed on the second computer system.

The system may be one wherein the second trained neural network isparametrized by one or several convolution matrices Ω, or the secondtrained neural network is parametrized by a set of bias parameters,non-linearity parameters, convolution kernel/matrix parameters.

According to a sixth aspect of the invention, there is provided acomputer-implemented method of training an image generative networkf_(θ) for a set of training images, in which an output image {circumflexover (x)} is generated from an input image x of the set of trainingimages non-losslessly, in which a blindspot network b_(α) is trainedwhich generates an output image {tilde over (x)} from an input image x,in which a proxy network is trained for a gradient intractableperceptual metric that evaluates a quality of an output image{circumflex over (x)} given an input image x, and in which a blindspotproxy network is trained for labelling blindspot samples, the methodincluding the steps of:

(i) the blindspot network b_(α) generating an output image {tilde over(x)} from an input image x of the set of training images;

(ii) the blindspot proxy network outputting a blindspot function output{tilde over (y)}, using x and {tilde over (x)} as inputs;

(iii) the gradient intractable perceptual metric outputting a functionoutput y, using x and {tilde over (x)} as inputs;

(iv) evaluating a loss for the blindspot network, using y and {tildeover (y)} as inputs;

(v) using backpropagation to compute gradients of parameters of theblindspot network with respect to the loss evaluated in step (iv);

(vi) optimizing the parameters of the blindspot network based on theresults of step (v), to provide an optimized blindspot network;

(vii) the image generative network f_(θ) generating an output image{circumflex over (x)} from an input image x, without tracking gradientsfor f_(θ);

(viii) the proxy network outputting an approximated function output ŷ,using x and {circumflex over (x)} as inputs;

(ix) the gradient intractable perceptual metric outputting a functionoutput y, using x and {circumflex over (x)} as inputs;

(x) evaluating a loss for the proxy network, using y and ŷ as inputs;

(xi) using backpropagation to compute gradients of parameters of theproxy network with respect to the loss evaluated in step (x);

(xii) optimizing the parameters of the proxy network based on theresults of step (xi), to provide an optimized proxy network;

(xiii) the blindspot network b_(α) generating an output image {tildeover (x)} from an input image x, without tracking gradients for b_(α);

(xiv) the blindspot proxy network outputting a representation {tildeover (y)}, using x and {tilde over (x)} as inputs;

(xv) a blindspot label function outputting a labelled output y of ablindspot sample;

(xvi) evaluating a loss for the blindspot proxy network, using y and{tilde over (y)} as inputs;

(xvii) using backpropagation to compute gradients of parameters of theblindspot proxy network with respect to the loss evaluated in step(xvi);

(xviii) optimizing the parameters of the blindspot proxy network basedon the results of step (xvii), to provide an optimized proxy network;

(xix) the image generative network f_(θ) generating an output image{circumflex over (x)} from an input image x;

(xx) the optimized proxy network outputting an optimized approximatedfunction output {tilde over (y)}, using x and {circumflex over (x)} asinputs;

(xxi) evaluating a loss for the generative network f_(θ), using x,{circumflex over (x)} and the optimized approximated function output ŷas inputs;

(xxii) using backpropagation to compute gradients of parameters of thegenerative network f_(θ) with respect to the loss evaluated in step(xxi);

(xxiii) optimizing the parameters of the generative network f_(θ) basedon the results of step (xxii), to provide an optimized generativenetwork f_(θ), and

(xxiv) repeating steps (i) to (xxiii) for each member of the set oftraining images.

An advantage is that the trained generative network is more robustagainst blind spots. An advantage is that the trained generative networkis more robust against artifact generation. An advantage is that theproxy network is more robust against adversarial samples. An advantageis that the proxy network is more robust against artifact generation. Anadvantage is that within the field of learned image and videocompression, the method allows networks to train with non-differentiableperceptual metrics.

The method may be one in which the method is repeated for the set oftraining images to train the generative network f_(θ), to train theblindspot network b_(α), to train the blindspot proxy network and totrain the proxy network.

The method may be one including the step of: (xxv) storing theparameters of the trained generative network f_(θ), the parameters ofthe trained blindspot network b_(α), the parameters of the trainedblindspot proxy network and the parameters of the trained proxy network.

The method may be one wherein a plurality of scales are used for inputimage x. An advantage is improved stability during the training. Anadvantage is that the proxy network is more robust against adversarialsamples. An advantage is that the proxy network is more robust againstartifact generation.

The method may be one wherein a regularisation term is added to the lossfor the generative network f_(θ), in which the term includes a functionthat penalises the generative network when predicting adversarialsamples. An advantage is improved stability during the training. Anadvantage is that the proxy network is more robust against adversarialsamples. An advantage is that the proxy network is more robust againstartifact generation.

The method may be one wherein a function that penalises the generativenetwork when predicting adversarial samples is a pixelwise error forperceptual proxy losses. An advantage is improved stability during thetraining. An advantage is that the proxy network is more robust againstadversarial samples. An advantage is that the proxy network is morerobust against artifact generation.

The method may be one wherein the regularisation term acts as adeterrent for the generative model, and steers it away from finding abasin on the loss surface which satisfies the proxy and the targetfunction, but which includes blind spot samples. An advantage isimproved stability during the training.

The method may be one wherein the regularisation term forces the modelto find another basin to settle in. An advantage is that the proxynetwork is more robust against adversarial samples. An advantage is thatthe proxy network is more robust against artifact generation.

The method may be one wherein the regularisation term produces high lossvalues for adversarial images. An advantage is that the proxy network ismore robust against adversarial samples. An advantage is that the proxynetwork is more robust against artifact generation.

The method may be one wherein a regularisation function is found byevaluating a set of loss functions on a set of adversarial samples, andselecting the loss function which produces the highest loss term. Anadvantage is that the proxy network is more robust against adversarialsamples. An advantage is that the proxy network is more robust againstartifact generation.

The method may be one wherein mitigation of blind spot samples isperformed by training the proxy on samples with self imposed labels toforce the network components to avoid the blind spot boundaries in theloss surface.

The method may be one wherein adversarial samples are collected, eitherfrom a model that is known to produce adversarial samples orsynthetically generated by an algorithm which adds noise or artefactsthat resemble the artefacts seen on adversarial samples; this stored setof adversarial images are each assigned a label such that a respectivelabel conveys to the blindspot proxy network that a respective sample isan undesired sample. An advantage is that the proxy network is morerobust against adversarial samples. An advantage is that the proxynetwork is more robust against artifact generation.

The method may be one wherein during the blindspot proxy networktraining, the input images are obtained from the generative model andthe labels are obtained from the loss function of the blindspot proxynetwork.

The method may be one wherein the blindspot proxy network is trainedonce on the adversarial sample for every N samples from the generativemodel, where N>1, e.g. N=20.

The method may be one wherein an online method of generating theadversarial samples is provided.

The method may be one wherein in this method there exists a networkconfiguration that only generates adversarial samples during itstraining by default, due to some model miss-specification; wherein thisnetwork, referred to as the blind spot network, produces adversarialsamples for our blindspot proxy network loss function and thereforehelps to define the underlying non-differentiable function.

The method may be one wherein the blind spot network is used to generateadversarial samples to train the proxy against.

The method may be one wherein a set of adversarial samples is notstored, but are instead generated in an online fashion using theblind-spot network, which is also learning.

According to a seventh aspect of the invention, there is provided acomputer system configured to train an image generative network f_(θ)for a set of training images, in which an output image {circumflex over(x)} is generated from an input image x of the set of training imagesnon-losslessly, in which a blindspot network b_(α) is trained whichgenerates an output image {tilde over (x)} from an input image x, inwhich a proxy network is trained for a gradient intractable perceptualmetric that evaluates a quality of an output image {circumflex over (x)}given an input image x, and in which a blindspot proxy network istrained for labelling blindspot samples, wherein the computer system isconfigured to:

(i) use the blindspot network b_(α) to generate an output image {tildeover (x)} from an input image x of the set of training images;

(ii) use the blindspot proxy network to output a blindspot functionoutput {tilde over (y)}, using x and {tilde over (x)} as inputs;

(iii) use the gradient intractable perceptual metric to output afunction output y, using x and {tilde over (x)} as inputs;

(iv) evaluate a loss for the blindspot network, using y and {tilde over(y)} as inputs;

(v) use backpropagation to compute gradients of parameters of theblindspot network with respect to the loss evaluated in (iv);

(vi) optimize the parameters of the blindspot network based on theresults of (v), to provide an optimized blindspot network;

(vii) use the image generative network f_(θ) to generate an output image{circumflex over (x)} from an input image x, without tracking gradientsfor f_(θ);

(viii) use the proxy network to output an approximated function outputŷ, using x and {circumflex over (x)} as inputs;

(ix) use the gradient intractable perceptual metric to output a functionoutput y, using x and {circumflex over (x)} as inputs;

(x) evaluate a loss for the proxy network, using y and ŷ as inputs;

(xi) use backpropagation to compute gradients of parameters of the proxynetwork with respect to the loss evaluated in (x);

(xii) optimize the parameters of the proxy network based on the resultsof (xi), to provide an optimized proxy network;

(xiii) use the blindspot network b_(α) to generate an output image{tilde over (x)} from an input image x, without tracking gradients forb_(α);

(xiv) use the blindspot proxy network to output a representation {tildeover (y)}, using x and {tilde over (x)} as inputs;

(xv) use a blindspot label function to output a labelled output y of ablindspot sample;

(xvi) evaluate a loss for the blindspot proxy network, using y and{tilde over (y)} as inputs;

(xvii) use backpropagation to compute gradients of parameters of theblindspot proxy network with respect to the loss evaluated in (xvi);

(xviii) optimize the parameters of the blindspot proxy network based onthe results of (xvii), to provide an optimized proxy network;

(xix) use the image generative network f_(θ) to generate an output image{circumflex over (x)} from an input image x;

(xx) use the optimized proxy network to output an optimized approximatedfunction output ŷ, using x and {circumflex over (x)} as inputs;

(xxi) evaluate a loss for the generative network f_(θ), using x,{circumflex over (x)} and the optimized approximated function output ŷas inputs;

(xxii) use backpropagation to compute gradients of parameters of thegenerative network f_(θ) with respect to the loss evaluated in (xxi);

(xxiii) optimize the parameters of the generative network f_(θ) based onthe results of (xxii), to provide an optimized generative network f_(θ),and

(xxiv) repeat (i) to (xxiii) for each member of the set of trainingimages.

The system may be one in which (i) to (xxiv) are repeated for the set oftraining images to train the generative network f_(θ), to train theblindspot network b_(α), to train the blindspot proxy network and totrain the proxy network.

The system may be one in which the parameters are stored of the trainedgenerative network f_(θ), the parameters of the trained blindspotnetwork b_(α), the parameters of the trained blindspot proxy network andthe parameters of the trained proxy network.

The computer system may be configured to perform a method of any aspectof the sixth aspect of the invention.

According to an eighth aspect of the invention, there is provided acomputer program product executable on a processor to train an imagegenerative network f_(θ) for a set of training images, in which anoutput image {circumflex over (x)} is generated from an input image x ofthe set of training images non-losslessly, in which a blindspot networkb_(α) is trained which generates an output image {tilde over (x)} froman input image x, in which a proxy network is trained for a gradientintractable perceptual metric that evaluates a quality of an outputimage {circumflex over (x)} given an input image x, and in which ablindspot proxy network is trained for labelling blindspot samples, thecomputer program product executable to:

(i) use the blindspot network b_(α) to generate an output image {tildeover (x)} from an input image x of the set of training images;

(ii) use the blindspot proxy network to output a blindspot functionoutput {tilde over (y)}, using x and {tilde over (x)} as inputs;

(iii) use the gradient intractable perceptual metric to output afunction output y, using x and {tilde over (x)} as inputs;

(iv) evaluate a loss for the blindspot network, using y and {tilde over(y)} as inputs;

(v) use backpropagation to compute gradients of parameters of theblindspot network with respect to the loss evaluated in (iv);

(vi) optimize the parameters of the blindspot network based on theresults of (v), to provide an optimized blindspot network;

(vii) use the image generative network f_(θ) to generate an output image{circumflex over (x)} from an input image x, without tracking gradientsfor f_(θ);

(viii) use the proxy network to output an approximated function outputŷ, using x and {circumflex over (x)} as inputs;

(ix) use the gradient intractable perceptual metric to output a functionoutput y, using x and {circumflex over (x)} as inputs;

(x) evaluate a loss for the proxy network, using y and ŷ as inputs;

(xi) use backpropagation to compute gradients of parameters of the proxynetwork with respect to the loss evaluated in (x);

(xii) optimize the parameters of the proxy network based on the resultsof (xi), to provide an optimized proxy network;

(xiii) use the blindspot network b_(α) to generate an output image{tilde over (x)} from an input image x, without tracking gradients forb_(α);

(xiv) use the blindspot proxy network to output a representation {tildeover (y)}, using x and {tilde over (x)} as inputs;

(xv) use a blindspot label function to output a labelled output y of ablindspot sample;

(xvi) evaluate a loss for the blindspot proxy network, using y and{tilde over (y)} as inputs;

(xvii) use backpropagation to compute gradients of parameters of theblindspot proxy network with respect to the loss evaluated in (xvi);

(xviii) optimize the parameters of the blindspot proxy network based onthe results of (xvii), to provide an optimized proxy network;

(xix) use the image generative network f_(θ) to generate an output image{circumflex over (x)} from an input image x;

(xx) use the optimized proxy network to output an optimized approximatedfunction output ŷ, using x and {circumflex over (x)} as inputs:

(xxi) evaluate a loss for the generative network f_(θ), using x,{circumflex over (x)} and the optimized approximated function output ŷas inputs;

(xxii) use backpropagation to compute gradients of parameters of thegenerative network f_(θ) with respect to the loss evaluated in (xxi);

(xxiii) optimize the parameters of the generative network f_(θ) based onthe results of (xxii), to provide an optimized generative network f_(θ),and

(xxiv) repeat (i) to (xxiii) for each member of the set of trainingimages.

The computer program product may be executable to repeat (i) to (xxiv)for the set of training images to train the generative network f_(θ), totrain the blindspot network b_(α), to train the blindspot proxy networkand to train the proxy network.

The computer program product may be executable to store the parametersof the trained generative network f_(θ), the parameters of the trainedblindspot network b_(α), the parameters of the trained blindspot proxynetwork and the parameters of the trained proxy network.

The computer program product may be executable on the processor toperform a method of any aspect of the sixth aspect of the invention.

According to a ninth aspect of the invention, there is provided acomputer-implemented method of training an image generative networkf_(θ) for a set of training images, in which an output image {circumflexover (x)} is generated from an input image x of the set of trainingimages non-losslessly, in which a blindspot network b_(α) is trainedwhich generates an output image {tilde over (x)} from an input image x,in which a proxy network is trained for a gradient intractableperceptual metric that evaluates a quality of an output image{circumflex over (x)} given an input image x, and in which a blindspotproxy network is trained for labelling blindspot samples.

The method may be one including a method of any aspect of the sixthaspect of the invention.

According to a tenth aspect of the invention, there is provided a systemincluding a first computer system and a second computer system, thefirst computer system including a lossy encoder including a firsttrained neural network, the second computer system including a decoderincluding a second trained neural network, wherein the second computersystem is in communication with the first computer system, the lossyencoder configured to produce a bitstream from an input image; the firstcomputer system configured to transmit the bitstream to the secondcomputer system, wherein the decoder is configured to decode thebitstream to produce an output image; wherein the first computer systemin communication with the second computer system comprises a generativenetwork, wherein the generative network is trained using a method of anyaspect of the sixth aspect of the invention.

The system may be one for image or video compression, transmission anddecoding, wherein

(i) the first computer system is configured to receive an input image;

(ii) the first computer system is configured to encode the input imageusing the first trained neural network, to produce a latentrepresentation;

(iii) the first computer system is configured to quantize the latentrepresentation to produce a quantized latent;

(iv) the first computer system is configured to entropy encode thequantized latent into a bitstream;

(v) the first computer system is configured to transmit the bitstream tothe second computer system;

(vi) the second computer system is configured to entropy decode thebitstream to produce the quantized latent;

(vii) the second computer system is configured to use the second trainedneural network to produce an output image from the quantized latent,wherein the output image is an approximation of the input image.Quantizing, entropy encoding and entropy decoding details are providedin PCT/GB2021/051041.

According to an eleventh aspect of the invention, there is provided acomputer-implemented method of training an image generative networkf_(θ) for a set of training images, in which an output image {circumflexover (x)} is generated from an input image x of the set of trainingimages non-losslessly, and in which a proxy network is trained for agradient intractable perceptual metric that evaluates a quality of anoutput image {circumflex over (x)} given an input image x, the methodincluding the steps of:

(i) the image generative network f_(θ) generating an output image{circumflex over (x)} from an input image x of the set of trainingimages, without tracking gradients for f_(θ);

(ii) the proxy network outputting an approximated function output ŷ,using x and {circumflex over (x)} as inputs;

(iii) the gradient intractable perceptual metric outputting a functionoutput y, using x and {circumflex over (x)} as inputs;

(iv) evaluating a loss for the proxy network, using y and ŷ as inputs;

(v) using backpropagation to compute gradients of parameters of theproxy network with respect to the loss evaluated in step (iv);

(vi) optimizing the parameters of the proxy network based on the resultsof step (v), to provide an optimized proxy network;

(vii) the image generative network f_(θ) generating an output image{circumflex over (x)} from an input image x,

(viii) the optimized proxy network outputting an optimized approximatedfunction output ŷ, using x and {circumflex over (x)} as inputs;

(ix) evaluating a loss for the generative network f_(θ), using x,{circumflex over (x)} and the optimized approximated function output ŷas inputs;

(x) using backpropagation to compute gradients of parameters of thegenerative network f_(θ) with respect to the loss evaluated in step(ix);

(xi) optimizing the parameters of the generative network f_(θ) based onthe results of step (x), to provide an optimized generative networkf_(θ), and

(xii) repeating steps (i) to (xi) for each member of the set of trainingimages.

An advantage is that the proxy network is more robust againstadversarial samples. An advantage is that the proxy network is morerobust against artifact generation. An advantage is that within thefield of learned image and video compression, the method allows networksto train with non-differentiable perceptual metrics.

The method may be one wherein the method is repeated for the set oftraining images, to train the generative network f_(θ) and to train theproxy network.

The method may be one including the step (xiii) of storing theparameters of the trained generative network f_(θ) and the parameters ofthe trained proxy network.

The method may be one wherein the image generative network f_(θ) is aneural network.

The method may be one wherein the proxy network is a neural network.

According to a twelfth aspect of the invention, there is provided acomputer system configured to train an image generative network f_(θ)for a set of training images, in which an output image {circumflex over(x)} is generated from an input image x of the set of training imagesnon-losslessly, and in which a proxy network is trained for a gradientintractable perceptual metric that evaluates a quality of an outputimage {circumflex over (x)} given an input image x, the computer systemconfigured to:

(i) use the image generative network f_(θ) to generate an output image{circumflex over (x)} from an input image x of the set of trainingimages, without tracking gradients for f_(θ);

(ii) use the proxy network to output an approximated function output ŷ,using x and {circumflex over (x)} as inputs;

(iii) use the gradient intractable perceptual metric to output afunction output y, using x and {circumflex over (x)} as inputs;

(iv) evaluate a loss for the proxy network, using y and 9 as inputs;

(v) use backpropagation to compute gradients of parameters of the proxynetwork with respect to the loss evaluated in (iv);

(vi) optimize the parameters of the proxy network based on the resultsof (v), to provide an optimized proxy network;

(vii) use the image generative network f_(θ) to generate an output image{circumflex over (x)} from an input image x,

(viii) use the optimized proxy network to output an optimizedapproximated function output ŷ, using x and z as inputs;

(ix) evaluate a loss for the generative network f_(θ), using x,{circumflex over (x)} and the optimized approximated function output ŷas inputs;

(x) use backpropagation to compute gradients of parameters of thegenerative network f_(θ) with respect to the loss evaluated in (ix);

(xi) optimize the parameters of the generative network f_(θ) based onthe results of (x), to provide an optimized generative network f_(θ),and

(xii) repeat (i) to (xi) for each member of the set of training images.

The computer system may be one wherein (i) to (xii) are repeated for theset of training images, to train the generative network f_(θ) and totrain the proxy network.

The computer system may be configured to perform a method of any aspectof the eleventh aspect of the invention.

According to a thirteenth aspect of the invention, there is provided acomputer program product executable on a processor to train an imagegenerative network f_(θ) for a set of training images, in which anoutput image {circumflex over (x)} is generated from an input image x ofthe set of training images non-losslessly, and in which a proxy networkis trained for a gradient intractable perceptual metric that evaluates aquality of an output image {circumflex over (x)} given an input image x,the computer program product executable to:

(i) use the image generative network f_(θ) to generate an output image{circumflex over (x)} from an input image x of the set of trainingimages, without tracking gradients for f_(θ);

(ii) use the proxy network to output an approximated function output ŷ,using x and {circumflex over (x)} as inputs;

(iii) use the gradient intractable perceptual metric to output afunction output y, using x and {circumflex over (x)} as inputs;

(iv) evaluate a loss for the proxy network, using y and ŷ as inputs;

(v) use backpropagation to compute gradients of parameters of the proxynetwork with respect to the loss evaluated in (iv);

(vi) optimize the parameters of the proxy network based on the resultsof (v), to provide an optimized proxy network;

(vii) use the image generative network f_(θ) to generate an output image{circumflex over (x)} from an input image x,

(viii) use the optimized proxy network to output an optimizedapproximated function output ŷ, using x and {circumflex over (x)} asinputs;

(ix) evaluate a loss for the generative network f_(θ), using x,{circumflex over (x)} and the optimized approximated function output ŷas inputs;

(x) use backpropagation to compute gradients of parameters of thegenerative network f_(θ) with respect to the loss evaluated in (ix);

(xi) optimize the parameters of the generative network f_(θ) based onthe results of (x), to provide an optimized generative network f_(θ),and

(xii) repeat (i) to (xi) for each member of the set of training images.

The computer program product may be one wherein (i) to (xii) arerepeated for the set of training images, to train the generative networkf_(θ) and to train the proxy network.

The computer program product may be executable on the processor toperform a method of any aspect of the eleventh aspect of the invention.

According to a fourteenth aspect of the invention, there is provided acomputer-implemented method of training an image generative networkf_(θ) for a set of training images, in which an output image {circumflexover (x)} is generated from an input image x of the set of trainingimages non-losslessly, and in which a proxy network is trained for agradient intractable perceptual metric that evaluates a quality of anoutput image {circumflex over (x)} given an input image x.

The method may include a method of any aspect of the eleventh aspect ofthe invention.

According to a fifteenth aspect of the invention, there is provided asystem including a first computer system and a second computer system,the first computer system including a lossy encoder including a firsttrained neural network, the second computer system including a decoderincluding a second trained neural network, wherein the second computersystem is in communication with the first computer system, the lossyencoder configured to produce a bitstream from an input image; the firstcomputer system configured to transmit the bitstream to the secondcomputer system, wherein the decoder is configured to decode thebitstream to produce an output image; wherein the first computer systemin communication with the second computer system comprises a generativenetwork, wherein the generative network is trained using a method of anyaspect of the eleventh aspect of the invention.

The system may be one in which the system is for image or videocompression, transmission and decoding, wherein

(i) the first computer system is configured to receive an input image;

(ii) the first computer system is configured to encode the input imageusing the first trained neural network, to produce a latentrepresentation;

(iii) the first computer system is configured to quantize the latentrepresentation to produce a quantized latent;

(iv) the first computer system is configured to entropy encode thequantized latent into a bitstream:

(v) the first computer system is configured to transmit the bitstream tothe second computer system;

(vi) the second computer system is configured to entropy decode thebitstream to produce the quantized latent;

(vii) the second computer system is configured to use the second trainedneural network to produce an output image from the quantized latent,wherein the output image is an approximation of the input image.Quantizing, entropy encoding and entropy decoding details are providedin PCT/GB2021/051041.

Aspects of the invention may be combined.

In the above methods and systems, an image may be a single image, or animage may be a video image, or images may be a set of video images, forexample.

The above methods and systems may be applied in the video domain.

A network may be a neural network. Networks may be neural networks.

For each of the above methods, a related system may be provided.

For each of the above training methods, a related computer programproduct may be provided.

BRIEF DESCRIPTION OF THE FIGURES

Aspects of the invention will now be described, by way of example(s),with reference to the following Figures, in which:

FIG. 1 shows an example of a generative network f_(θ)(x)={circumflexover (x)}, and a differentiable proxy network ĥ_(ϕ)(x, {circumflex over(x)})=ŷ which approximates a non-differentiable target function (GIF)h_(ξ)(x, {circumflex over (x)})=y. Note, we can train both networksf_(θ) and ĥ_(ϕ) at the same time.

FIG. 2A shows an example in which a training of f_(θ) requires gradientflow via ĥ_(ϕ) and parameter updates from the optimiser opt{f_(θ)}. Thedotted arrows indicate schematically the direction of back-propagation.

FIG. 2B shows an example in which a training of ĥ_(ϕ) involves samplesof f_(θ)(x) and x, but does not require gradients for {circumflex over(x)}. ĥ_(ϕ) is trained to minimise the loss L_(proxy)(ŷ,y) withoptimizer opt{ĥ_(ϕ)}. The dotted arrows indicate schematically thedirection of back-propagation.

FIG. 3 shows an example of a structure of a proxy network ĥ_(ϕ)(x,{circumflex over (x)})=y.

FIG. 4 shows an example of a resblock component with 3 internal blocks(x3). For example “(128, 256, 2)” indicates there are 128 channels in α,256 channels in β and “2” indicates a stride of 2 is used to downsampleat the end of the sequence. The circle with a “+” at its centreindicates element-wise addition. For example “Conv2d(128, 128, 1)”indicates a 2D convolutional operation of input channels of size 128,output channels of size 128, stride of 1 and a default padding of sizestride/2.

FIG. 5 shows an example in which an auto-encoder is the generativenetwork of FIG. 1 .

FIG. 6 shows an example of adversarial samples generated by a generativenetwork f_(θ) where h_(ξ) is VMAF. The white bounding boxes indicate thecorresponding enlarged regions in FIG. 7 . Note that the distorted imagehas a VMAF score of 85 out of approximately 96.

FIG. 7 shows an example of adversarial samples generated by thegenerative network f_(θ) where h_(ξ) is VMAF. The images shown in thefigure are enlarged views of the corresponding regions contained withinthe white bounding boxes shown in FIG. 6 . Notice the checkerboard-likeartifacts in the distorted image which have been learnt by thegenerative network f_(θ) as a method of minimizing the losscorresponding to f_(θ) because VMAF is susceptible to these types ofartifacts which are possibly outside the boundary for which the functionis well-defined, i.e. these artifacts align well with human perception,and the generative network f_(θ) considers images with these artifactsperceptually more similar. The distorted image is referred to as anadversarial sample.

FIG. 8 shows an example of multiscale training for the case of images xϵ

³ where for each image x, an RGB image at three different scales isprovided. The generative network, along with the proxy and perceptualmetric process each scale of image and perform an aggregation at the endusing some function, such as a mean operator.

FIG. 9 shows a training example in which a set of adversarial samples{tilde over (x)}_(i) is introduced, with associated labels {tilde over(y)}_(i). The loss surface of ĥ_(ϕ) is directly discouraged to enterblind spots by training against the sample set {tilde over (x)}_(i) withself-imposed label set {tilde over (y)}.

FIG. 10 shows a training example in which a blind spot network isintroduced, with associated outputs {tilde over (x)}_(i). The losssurface of ĥ_(ϕ) is directly discouraged to enter boundaries of blindspots by training against the samples from the blind spot network withself-imposed labels {tilde over (y)}_(i). The blind spot network itselfis trained using a proxy network. The blind spot network can either usethe same (as in this figure) or a different proxy network (not shown)from the encoder decoder network.

FIG. 11 shows a schematic diagram of an artificial intelligence(AI)-based compression process, including encoding an input image xusing a neural network E( . . . ), and decoding using a neural networkD( . . . ), to provide an output image {circumflex over (x)}. Runtimeissues are relevant to the Encoder. Runtime issues are relevant to theDecoder. Examples of issues of relevance to parts of the process areidentified.

DETAILED DESCRIPTION Technology Overview

We provide a high level overview of some aspects of our artificialintelligence (AI)-based (e.g. image and/or video) compressiontechnology.

In general, compression can be lossless, or lossy. In losslesscompression, and in lossy compression, the file size is reduced. Thefile size is sometimes referred to as the “rate”.

But in lossy compression, it is possible to change what is input. Theoutput image {circumflex over (x)} after reconstruction of a bitstreamrelating to a compressed image is not the same as the input image x. Thefact that the output image {circumflex over (x)} may differ from theinput image x is represented by the hat over the “x”. The differencebetween x and {circumflex over (x)} may be referred to as “distortion”,or “a difference in image quality”. Lossy compression may becharacterized by the “output quality”, or “distortion”.

Although our pipeline may contain some lossless compression, overall thepipeline uses lossy compression.

Usually, as the rate goes up, the distortion goes down. A relationbetween these quantities for a given compression scheme is called the“rate-distortion equation”. For example, a goal in improving compressiontechnology is to obtain reduced distortion, for a fixed size of acompressed file, which would provide an improved rate-distortionequation. For example, the distortion can be measured using the meansquare error (MSE) between the pixels of x and {circumflex over (x)},but there are many other ways of measuring distortion, as will be clearto the person skilled in the art. Known compression and decompressionschemes include for example, JPEG, JPEG2000, AVC, HEVC, AVI.

In an example, our approach includes using deep learning and AI toprovide an improved compression and decompression scheme, or improvedcompression and decompression schemes.

In an example of an artificial intelligence (AI)-based compressionprocess, an input image x is provided. There is provided a neuralnetwork characterized by a function E( . . . ) which encodes the inputimage x. This neural network E( . . . ) produces a latentrepresentation, which we call w. The latent representation is quantizedto provide ŵ, a quantized latent. The quantized latent goes to anotherneural network characterized by a function D( . . . ) which is adecoder. The decoder provides an output image, which we call {circumflexover (x)}. The quantized latent w is entropy-encoded into a bitstream.

For example, the encoder is a library which is installed on a userdevice, e.g. laptop computer, desktop computer, smart phone. The encoderproduces the w latent, which is quantized to ŵ, which is entropy encodedto provide the bitstream, and the bitstream is sent over the internet toa recipient device. The recipient device entropy decodes the bitstreamto provide ŵ, and then uses the decoder which is a library installed ona recipient device (e.g. laptop computer, desktop computer, smart phone)to provide the output image {circumflex over (x)}.

E may be parametrized by a convolution matrix Θ such that w=E_(Θ)(x).

D may be parametrized by a convolution matrix Ω such that i=D_(Ω)(ŵ).

We need to find a way to learn the parameters Θ and Ω of the neuralnetworks.

The compression pipeline may be parametrized using a loss function L. Inan example, we use back-propagation of gradient descent of the lossfunction, using the chain rule, to update the weight parameters of Θ andΩ of the neural networks using the gradients ∂L/∂y.

The loss function is the rate-distortion trade off. The distortionfunction is

(x, {circumflex over (x)}), which produces a value, which is the loss ofthe distortion

. The loss function can be used to back-propagate the gradient to trainthe neural networks.

So for example, we use an input image, we obtain a loss function, weperform a backwards propagation, and we train the neural networks. Thisis repeated for a training set of input images, until the pipeline istrained. The trained neural networks can then provide good qualityoutput images.

An example image training set is the KODAK image set (e.g. atwww.cs.albany.edu/˜xypan/research/snr/Kodak.html). An example imagetraining set is the IMAX image set. An example image training set is theImagenet dataset (e.g. at www.image-net.org/download). An example imagetraining set is the CLIC Training Dataset P (“professional”) and M(“mobile”) (e.g. at http://challenge.compression.cc/tasks/).

In an example, the production of the bitstream from w is losslesscompression.

In the pipeline, the pipeline needs a loss that we can use for training,and the loss needs to resemble the rate-distortion trade off.

A loss which may be used for neural network training is Loss=

+λ*R, where

is the distortion function, λ is a weighting factor, and R is the rateloss. R is related to entropy. Both

and R are differentiable functions.

Distortion functions

(x, {circumflex over (x)}), which correlate well with the human visionsystem, are hard to identify. There exist many candidate distortionfunctions, but typically these do not correlate well with the humanvision system, when considering a wide variety of possible distortions.

We want humans who view picture or video content on their devices, tohave a pleasing visual experience when viewing this content, for thesmallest possible file size transmitted to the devices. So we havefocused on providing improved distortion functions, which correlatebetter with the human vision system. Modern distortion functions veryoften contain a neural network, which transforms the input and theoutput into a perceptional space, before comparing the input and theoutput. The neural network can be a generative adversarial network (GAN)which performs some hallucination. There can also be some stabilization.It turns out it seems that humans evaluate image quality over densityfunctions.

Hallucinating is providing fine detail in an image, which can begenerated for the viewer, where all the fine, higher spatialfrequencies, detail does not need to be accurately transmitted, but someof the fine detail can be generated at the receiver end, given suitablecues for generating the fine details, where the cues are sent from thetransmitter.

FIG. 11 shows a schematic diagram of an artificial intelligence(AI)-based compression process, including encoding an input image xusing a neural network, and decoding using a neural network, to providean output image x.

In an example of a layer in an encoder neural network, the layerincludes a convolution, a bias and an activation function. In anexample, four such layers are used.

There is provided a computer-implemented method for lossy image or videocompression, transmission and decoding, the method including the stepsof:

(i) receiving an input image at a first computer system;

(ii) encoding the input image using a first trained neural network,using the first computer system, to produce a latent representation;

(iii) quantizing the latent representation using the first computersystem to produce a quantized latent;

(iv) entropy encoding the quantized latent into a bitstream, using thefirst computer system;

(v) transmitting the bitstream to a second computer system;

(vi) the second computer system entropy decoding the bitstream toproduce the quantized latent;

(vii) the second computer system using a second trained neural networkto produce an output image from the quantized latent, wherein the outputimage is an approximation of the input image. A related system includinga first computer system, a first trained neural network, a secondcomputer system and a second trained neural network, may be provided.

An advantage is that for a fixed file size (“rate”), a reduced outputimage distortion is obtained. An advantage is that for a fixed outputimage distortion, a reduced file size (“rate”) is obtained.

There is provided a computer implemented method of training a firstneural network and a second neural network, the neural networks beingfor use in lossy image or video compression, transmission and decoding,the method including the steps of:

(i) receiving an input training image;

(ii) encoding the input training image using the first neural network,to produce a latent representation;

(iii) quantizing the latent representation to produce a quantizedlatent;

(iv) using the second neural network to produce an output image from thequantized latent, wherein the output image is an approximation of theinput image;

(v) evaluating a loss function based on differences between the outputimage and the input training image;

(vi) evaluating a gradient of the loss function;

(vii) back-propagating the gradient of the loss function through thesecond neural network and through the first neural network, to updateweights of the second neural network and of the first neural network;and

(viii) repeating steps (i) to (vii) using a set of training images, toproduce a trained first neural network and a trained second neuralnetwork, and

(ix) storing the weights of the trained first neural network and of thetrained second neural network. A related computer program product may beprovided.

An advantage is that, when using the trained first neural network andthe trained second neural network, for a fixed file size (“rate”), areduced output image distortion is obtained; and for a fixed outputimage distortion, a reduced file size (“rate”) is obtained.

Example Aspects of Adversarial Learning of Differentiable Proxy ofGradient Intractable Networks

A generative network f_(θ) which generates an output image {circumflexover (x)} from an input image x is provided. A differentiable proxynetwork ĥ_(ϕ) which generates a function output ŷ from x and {circumflexover (x)} according to ĥ_(ϕ)(x, {circumflex over (x)})=ŷ is provided.The differentiable proxy network ĥ_(ϕ) approximates a non-differentiabletarget function (GIF) h_(ξ) which generates a function output y from xand {circumflex over (x)} according to h_(ξ)(x, {circumflex over(x)})=y. It is possible to train both networks f_(θ) and ĥ_(ϕ) at thesame time. An example is shown in FIG. 1 .

In an example, a training of f_(θ) requires gradient flow via ĥ_(ϕ) andparameter updates for f_(θ) from an optimiser opt{f_(θ)}. An example isshown in FIG. 2A, in which the dotted arrows indicate schematically thedirection of back-propagation.

In an example, a training of ĥ_(ϕ) involves samples of f_(θ)(x) and x,but does not require gradients for {circumflex over (x)}. ĥ_(ϕ) istrained to minimise the loss L_(proxy)(ŷ,y) with optimizer opt{ĥ_(ϕ)}.An example is shown in FIG. 2B, in which the dotted arrows indicateschematically the direction of back-propagation.

A generative network f_(θ) which generates an output image {circumflexover (x)} from an input image x is provided. In an example, thegenerative network f_(θ) includes an encoder, which encodes (e.g. whichperforms lossy encoding) an input image x into a bitstream, and includesa decoder, which decodes the bitstream into an output image {circumflexover (x)}. A differentiable proxy network ĥ_(ϕ) which generates afunction output ŷ from x and {circumflex over (x)} according to ĥ_(ϕ)(x,{circumflex over (x)})=ŷ is provided. The differentiable proxy networkĥ_(ϕ) approximates a non-differentiable target function (GIF) h_(ξ)which generates a function output y from x and {circumflex over (x)}according to h_(ξ)(x, {circumflex over (x)})=y. It is possible to trainboth networks f_(θ) and ĥ_(ϕ) at the same time. An example is shown inFIG. 5 .

Adversarial samples may be generated by a generative network f_(θ) whereh_(ξ) is VMAF. FIG. 6 shows an example of adversarial samples generatedby a generative network f_(θ) where h_(ξ) is VMAF. The white boundingboxes indicate the corresponding enlarged regions in FIG. 7 . Note thatthe distorted image has a VMAF score of 85 out of approximately 96.

A generative network f_(θ) which generates an output image {circumflexover (x)}_(i) from an input image x_(i) is provided. In an example, thegenerative network f_(θ) includes an encoder, which encodes (e.g. whichperforms lossy encoding) an input image x_(i) into a bitstream, andincludes a decoder, which decodes the bitstream into an output image{circumflex over (x)}_(i). A differentiable proxy network ĥ_(ϕ) whichgenerates a function output ŷ_(i) from x_(i) and {circumflex over(x)}_(i), according to ĥ_(ϕ)(x_(i), {circumflex over (x)}_(i))=ŷ_(i) isprovided. The differentiable proxy network ĥ_(ϕ) approximates anon-differentiable target function (GIF) h_(ξ) which generates afunction output y, from x_(i) and {circumflex over (x)}_(i) according toh_(ξ)(x_(i), {circumflex over (x)}_(i))=y_(i). It is possible to trainboth networks f_(θ) and ĥ_(ϕ) at the same time. Multiscale training isprovided for the case of multiscale images x_(i)ϵ

³ where for each image x, an RGB image at a plurality of differentscales is used. The generative network f_(θ), along with the proxynetwork ĥ_(ϕ) and the perceptual metric h_(ξ) process each scale ofimage and finally perform an aggregation using some aggregationfunction, such as a mean operator. FIG. 8 shows an example of multiscaletraining for the case of images x_(i)ϵ

³ where for each image x, an RGB image at three different scales isprovided: x_(i), where i=1, 2 or 3.

A generative network f_(θ) which generates an output image {circumflexover (x)}_(i) from an input image x_(i) is provided. In an example, thegenerative network f_(θ) includes an encoder, which encodes (e.g. whichperforms lossy encoding) an input image x_(i) into a bitstream, andincludes a decoder, which decodes the bitstream into an output image{circumflex over (x)}_(i). A differentiable proxy network ĥ_(ϕ) whichgenerates a function output ŷ_(i) from x_(i) and {circumflex over(x)}_(i) according to ĥ_(ϕ)(x_(i), {circumflex over (x)}_(i))=ŷ_(i) isprovided. The differentiable proxy network ĥ_(ϕ) approximates anon-differentiable target function (GIF) h_(ξ) which generates afunction output y_(i) from x_(i) and {circumflex over (x)}_(i) accordingto h_(ξ)(x_(i), {circumflex over (x)}_(i))=y_(i). It is possible totrain both networks f_(θ) and ĥ_(ϕ) at the same time. Multiscaletraining is provided for the case of multiscale images x_(i)ϵ

³ where for each image x, an RGB image at a plurality of differentscales is used. The generative network f_(θ), along with the proxynetwork ĥ_(ϕ) and the perceptual metric h_(ξ) process each scale ofimage and finally perform an aggregation using some aggregationfunction, such as a mean operator. In an example, a set of adversarialsamples {tilde over (x)}_(i) is introduced, with associated labels{tilde over (y)}_(i), which are generated according to ĥ_(ϕ)(x_(i),{tilde over (x)}_(i))=ŷ_(i). The loss surface of ĥ_(ϕ) is directlydiscouraged to enter blind spots by training against the sample set{tilde over (x)}_(i) with self-imposed label set {tilde over (y)}_(i).FIG. 9 shows a training example in which a set of adversarial samples{tilde over (x)}_(i) is introduced, with associated labels {tilde over(y)}_(i), and the loss surface of ĥ_(ϕ) is directly discouraged to enterblind spots by training against the sample set {tilde over (x)}_(i) withself-imposed label set {tilde over (y)}_(i).

A generative network f_(θ) which generates an output image {circumflexover (x)}_(i) from an input image x_(i) is provided. In an example, thegenerative network f_(θ) includes an encoder, which encodes (e.g. whichperforms lossy encoding) an input image x_(i) into a bitstream, andincludes a decoder, which decodes the bitstream into an output image{circumflex over (x)}_(i). A differentiable proxy network ĥ_(ϕ) whichgenerates a function output ŷ_(i) from x_(i) and {circumflex over(x)}_(i) according to ĥ_(ϕ)(x_(i), {circumflex over (x)}_(i))=ŷ_(i) isprovided. The differentiable proxy network ĥ_(ϕ) approximates anon-differentiable target function (GIF) h_(ξ) which generates afunction output y_(i) from x_(i) and {circumflex over (x)}_(i) accordingto h_(ξ)(x_(i), {circumflex over (x)}_(i))=y_(i). It is possible totrain both networks f_(θ) and ĥ_(ϕ) at the same time. Multiscaletraining may be provided for the case of multiscale images x_(i)ϵ

³ where for each image x, an RGB image at a plurality of differentscales is used. The generative network f_(θ), along with the proxynetwork ĥ_(ϕ) and the perceptual metric h_(ξ) process each scale ofimage and finally perform an aggregation using some aggregationfunction, such as a mean operator. In an example, a set of adversarialsamples {tilde over (x)}_(i) are generated by a blind spot network froma set of x_(i). The {tilde over (x)}_(i) have associated labels {tildeover (y)}_(i), which are generated according to ĥ_(ϕ)(x_(i), {tilde over(x)}_(i))={tilde over (y)}_(i). The loss surface of ĥ_(ϕ) is directlydiscouraged to enter blind spots by training against the sample set{tilde over (x)}_(i) with self-imposed label set {tilde over (y)}_(i).The blind spot network itself may be trained using a proxy network. Theblind spot network can either use the same (as in FIG. 10 ) or adifferent proxy network (not shown in FIG. 10 ) from the encoder decodernetwork. FIG. 10 shows a training example in which a blind spot networkis present.

In an example of a trained generative network, an encoder including afirst trained neural network is provided on a first computer system, anda decoder is provided on a second computer system in communication withthe first computer system, the decoder including a second trained neuralnetwork. The encoder produces a bitstream from an input image; thebitstream is transmitted to the second computer system, where thedecoder decodes the bitstream to produce an output image. The outputimage may be an approximation of the input image.

The first computer system may be a server, e.g. a dedicated server, e.g.a machine in the cloud with dedicated GPUs e.g. Amazon Web Services,Microsoft Azure, etc, or any other cloud computing services.

The first computer system may be a user device. The user device may be alaptop computer, desktop computer, a tablet computer or a smart phone.

The first trained neural network may include a library installed on thefirst computer system.

The first trained neural network may be parametrized by one or severalconvolution matrices Θ, or the first trained neural network may beparametrized by a set of bias parameters, non-linearity parameters,convolution kernel/matrix parameters.

The second computer system may be a recipient device.

The recipient device may be a laptop computer, desktop computer, atablet computer, a smart TV or a smart phone.

The second trained neural network may include a library installed on thesecond computer system.

The second trained neural network may be parametrized by one or severalconvolution matrices Ω, or the second trained neural network may beparametrized by a set of bias parameters, non-linearity parameters,convolution kernel/matrix parameters.

Notes Re VMAF

Video Multimethod Assessment Fusion (VMAF) is an objectivefull-reference video quality metric. It predicts subjective videoquality based on a reference and distorted video sequence. The metriccan be used to evaluate the quality of different video codecs, encoders,encoding settings, or transmission variants.

VMAF uses existing image quality metrics and other features to predictvideo quality:

-   -   Visual Information Fidelity (VIF): considers information        fidelity loss at four different spatial scales.    -   Detail Loss Metric (DLM): measures loss of details, and        impairments which distract viewer attention.    -   Mean Co-Located Pixel Difference (MCPD): measures temporal        difference between frames on the luminance component.    -   Anti-noise signal-to-noise ratio (AN-SNR).

The above features are fused using a support-vector machine (SVM)-basedregression to provide a single output score in the range of 0-100 pervideo frame, with 100 being quality identical to the reference video.These scores are then temporally pooled over the entire video sequenceusing the arithmetic mean to provide an overall differential meanopinion score (DMOS).

Due to the public availability of the training source code (“VMAFDevelopment Kit”, VDK), the fusion method can be re-trained andevaluated based on different video datasets and features.

Regarding perceptual specific GIF's, some other examples apart from VMAFare:

-   -   VIF—Visual Information Fidelity    -   DLM—Detail Loss Metric    -   IFC—Information Fidelity Criterion.

Regarding perceptual specific GIF's, an example class of GIFs is mutualinformation based estimators.

Notes Re Training

Regarding seeding the neural networks for training, all the neuralnetwork parameters can be randomized with standard methods (such asXavier Initialization). Typically, we find that satisfactory results areobtained with sufficiently small learning rates.

Other Applications

As an alternative to applications described in this document which use agradient intractable perceptual metric, the present invention may bere-purposed for applications relating to quantisation. In an applicationrelating to quantisation, we can use a proxy network to learn anyintractable gradient function in machine learning. So as an alternativeto the perceptual metric, the quantisation (round) function may be used.A quantisation (round) function may be used in our pipeline on thelatent space to convert it to a quantised latent space during encoding.This is a problem for training as a quantisation (round) function doesnot have usable gradients. It is possible to learn the quantisation(round) function using a proxy neural network (since we always know theground truth values) and use this network (which allows gradients to bepropagated) for quantisation during training. The method is similar tothat described in the algorithms 1.1, 1.2 and 1.3, but the intractablegradient function is now the quantisation (round) function.

As an alternative to applications described in this document which use agradient intractable perceptual metric, the present invention may bere-purposed for applications relating to a runtime device proxy.Techniques such as NAS (Network Architecture Search) can be used todrive the search for efficient architecture using the measured runtimeon a device as the loss function to minimise. However, this is currentlynot possible as it's too time-consuming to execute each model on adevice to assess its runtime per iteration of training. We use a proxynetwork to learn the mapping from architecture to runtime. This proxy istrained by generating 1000, or at least 1000, architectures randomly,timing their runtime on a device, and then fitting a neural network tothis data. Having this runtime proxy allows us to get runtimes ofarchitecture easily and within a few seconds of processing (e.g. throughthe forward pass of the proxy network). This proxy can be then be usedas a stand-alone to assess run timings of architectures or in a NASbased setting to drive learning.

Note

It is to be understood that the arrangements referenced herein are onlyillustrative of the application for the principles of the presentinventions. Numerous modifications and alternative arrangements can bedevised without departing from the spirit and scope of the presentinventions. While the present inventions are shown in the drawings andfully described with particularity and detail in connection with what ispresently deemed to be the most practical and preferred examples of theinventions, it will be apparent to those of ordinary skill in the artthat numerous modifications can be made without departing from theprinciples and concepts of the inventions as set forth herein.

The invention claimed is:
 1. A computer-implemented method of trainingan image generative network f_(θ) for a set of training images, in whichan output image {circumflex over (x)} is generated from an input image xof the set of training images non-losslessly, and in which a proxynetwork is trained for a gradient intractable perceptual metric thatevaluates a quality of an output image {circumflex over (x)} given aninput image x, the method of training using a plurality of scales forinput images from the set of training images, the method including thesteps of: (i) receiving an input image x of the set of training imagesand generating one or more images which are derived from x to make amultiscale set of images {x_(i)} which includes x; (ii) the imagegenerative network f_(θ) generating an output image {circumflex over(x)}_(i) from an input image x_(i)ϵ{x_(i)}, without tracking gradientsfor f_(θ); (iii) the proxy network outputting an approximated functionoutput ŷ_(i), using the x_(i) and the {circumflex over (x)}_(i) asinputs; (iv) the gradient intractable perceptual metric outputting afunction output y_(i), using the x_(i) and the {circumflex over (x)}_(i)as inputs; (v) evaluating a loss for the proxy network, using the y_(i)and the ŷ_(i) as inputs, and including the evaluated loss for the proxynetwork in a loss array for the proxy network; (vi) repeating steps (ii)to (v) for all the images x_(i) in the multiscale set of images {x_(i)};(vii) using backpropagation to compute gradients of parameters of theproxy network with respect to an aggregation of the loss array assembledin executions of step (v); (viii) optimizing the parameters of the proxynetwork based on the results of step (vii), to provide an optimizedproxy network; (ix) the image generative network f_(θ) generating anoutput image {circumflex over (x)}_(i) from an input imagex_(i)ϵ{x_(i)}; (x) the optimized proxy network outputting an optimizedapproximated function output ŷ_(i), using the {circumflex over (x)}_(i)and the {circumflex over (x)}_(i) as inputs; (xi) evaluating a loss forthe generative network f_(θ), using the x_(i), the {circumflex over(x)}_(i) and the optimized approximated function output ŷ_(i) as inputs,and including the evaluated loss for the generative network f_(θ) in aloss array for the generative network f_(θ); (xii) repeating steps (ix)to (xi) for all the images x_(i) in the multiscale set of images{x_(i)}; (xiii) using backpropagation to compute gradients of parametersof the generative network f_(θ) with respect to an aggregation of theloss array assembled in executions of step (xi); (xiv) optimizing theparameters of the generative network f_(θ) based on the results of step(xiii), to provide an optimized generative network f_(θ), and (xv)repeating steps (i) to (xiv) for each member of the set of trainingimages.
 2. The method of claim 1, wherein the one or more images whichare derived from x to make a multiscale set of images {x_(i)} arederived by downsampling.
 3. The method of claim 1, wherein thegenerative network f_(θ) includes an encoder, which encodes (byperforming lossy encoding) an input image x into a bitstream, andincludes a decoder, which decodes the bitstream into an output image{circumflex over (x)}.
 4. The method of claim 1, wherein the methodincludes an iteration of a training pass of the generative network, anda training pass of the proxy network.
 5. The method of claim 1, whereinthe generative and proxy networks have separate optimizers.
 6. Themethod of claim 1, wherein for the case of proxy network optimization,gradients do not flow through the generative network.
 7. The method ofclaim 1, wherein the method is used for learned image or videocompression.
 8. The method of claim 1, wherein the gradient intractableperceptual metric is a perceptual loss function.
 9. The method of claim1, wherein the gradient intractable perceptual metric is VMAF, VIF, DLMor IFC, or a mutual information based estimator.
 10. The method of claim1, wherein the generative network includes a compression network,wherein a term is added to the total loss of the compression network tostabilise the initial training of the compression network.
 11. Themethod of claim 1, wherein the generative loss includes a genericdistortion loss which includes one or more stabilisation terms.
 12. Themethod of claim 1, wherein the stabilisation terms include Mean SquaredError (MSE) or a combination of analytical losses with weighteddeep-embeddings of a pre-trained neural network.
 13. The method of claim1, wherein a perceptual quality score is assigned to the image at eachscale and is aggregated by an aggregation function.
 14. The method ofclaim 1, wherein the set of images includes a downsampled image that hasbeen downsampled by a factor of two in each dimension.
 15. The method ofclaim 1, wherein the set of images includes a downsampled image that hasbeen downsampled by a factor of four in each dimension.
 16. The methodof claim 1, wherein the mean of the ŷ_(i) is used to train the imagegenerative network by attempting to maximise or minimise the mean of theŷ_(i) using stochastic gradient descent.
 17. The method of claim 1,wherein the predictions y_(i) are used to train the proxy network toforce its predictions to be closer to an output of the perceptualmetric, using stochastic gradient descent.
 18. The method of claim 1,wherein for each image x, an RGB image is provided.
 19. A computersystem configured to train an image generative network f_(θ) for a setof training images, in which the system generates an output image{circumflex over (x)} from an input image x of the set of trainingimages non-losslessly, and in which a proxy network is trained for agradient intractable perceptual metric that evaluates a quality of anoutput image {circumflex over (x)} given an input image x, wherein thecomputer system is configured to: (i) receive an input image x from theset of training images and generate one or more images which are derivedfrom x to make a multiscale set of images {x_(i)} which includes x; (ii)use the image generative network f_(θ) to generate an output image{circumflex over (x)}_(i) from an input image x_(i)ϵ{x_(i)}, withouttracking gradients for f_(θ); (iii) use the proxy network to output anapproximated function output ŷ_(i), using the x_(i) and the {circumflexover (x)}_(i) as inputs; (iv) use the gradient intractable perceptualmetric to output a function output y_(i), using the x_(i) and the{circumflex over (x)}_(i) as inputs; (v) evaluate a loss for the proxynetwork, using the y_(i) and the ŷ_(i) as inputs, and to include theevaluated loss for the proxy network in a loss array for the proxynetwork; (vi) repeat (ii) to (v) for all the images x_(i) in themultiscale set of images {x_(i)}; (vii) use backpropagation to computegradients of parameters of the proxy network with respect to anaggregation of the loss array assembled in executions of (v); (viii)optimize the parameters of the proxy network based on the results of(vii), to provide an optimized proxy network; (ix) use the imagegenerative network f_(θ) to generate an output image {circumflex over(x)}_(i) from an input image x_(i)ϵ{x_(i)}; (x) use the optimized proxynetwork to output an optimized approximated function output ŷ_(i), usingthe x_(i) and the {circumflex over (x)}_(i) as inputs; (xi) evaluate aloss for the generative network f_(θ), using the x_(i), the {circumflexover (x)}_(i) and the optimized approximated function output ŷ_(i) asinputs, and to include the evaluated loss for the generative networkf_(θ) in a loss array for the generative network f_(θ); (xii) repeat(ix) to (xi) for all the images x_(i) in the multiscale set of images{x_(i)}; (xiii) use backpropagation to compute gradients of parametersof the generative network f_(θ) with respect to an aggregation of theloss array assembled in executions of (xi); (xiv) optimize theparameters of the generative network f_(θ) based on the results of(xiii), to provide an optimized generative network f_(θ), and (xv)repeat (i) to (xiv) for each member of the set of training images.
 20. Acomputer-implemented method of training an image generative networkf_(θ) for a set of training images, in which an output image {circumflexover (x)} is generated from an input image x of the set of trainingimages non-losslessly, and in which a proxy network is trained for agradient intractable perceptual metric that evaluates a quality of anoutput image {circumflex over (x)} given an input image x, the methodincluding the steps of: (i) the image generative network f_(θ)generating an output image {circumflex over (x)} from an input image xof the set of training images, without tracking gradients for f_(θ);(ii) the proxy network outputting an approximated function output ŷ,using x and {circumflex over (x)} as inputs; (iii) the gradientintractable perceptual metric outputting a function output y, using xand {circumflex over (x)} as inputs; (iv) evaluating a loss for theproxy network, using y and ŷ as inputs; (v) using backpropagation tocompute gradients of parameters of the proxy network with respect to theloss evaluated in step (iv); (vi) optimizing the parameters of the proxynetwork based on the results of step (v), to provide an optimized proxynetwork; (vii) the image generative network f_(θ) generating an outputimage {circumflex over (x)} from an input image x, (viii) the optimizedproxy network outputting an optimized approximated function output ŷ,using x and {circumflex over (x)} as inputs; (ix) evaluating a loss forthe generative network f_(θ), using x, {circumflex over (x)} and theoptimized approximated function output ŷ as inputs; (x) usingbackpropagation to compute gradients of parameters of the generativenetwork f_(θ) with respect to the loss evaluated in step (ix); (xi)optimizing the parameters of the generative network f_(θ) based on theresults of step (x), to provide an optimized generative network f_(θ),and (xii) repeating steps (i) to (xi) for each member of the set oftraining images.